Overview

Dataset statistics

Number of variables18
Number of observations1241
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory174.6 KiB
Average record size in memory144.1 B

Variable types

Categorical4
Numeric14

Alerts

Village has a high cardinality: 1210 distinct values High cardinality
pH is highly correlated with WQIHigh correlation
EC is highly correlated with TDS and 10 other fieldsHigh correlation
TDS is highly correlated with EC and 10 other fieldsHigh correlation
TH is highly correlated with EC and 10 other fieldsHigh correlation
Alkalinity is highly correlated with EC and 8 other fieldsHigh correlation
Calcium is highly correlated with EC and 6 other fieldsHigh correlation
Magnesium is highly correlated with EC and 9 other fieldsHigh correlation
Sodium is highly correlated with EC and 9 other fieldsHigh correlation
Bicarbonate is highly correlated with EC and 8 other fieldsHigh correlation
Chloride is highly correlated with EC and 7 other fieldsHigh correlation
Sulphate is highly correlated with EC and 8 other fieldsHigh correlation
Fluoride is highly correlated with WQIHigh correlation
is_drinkable is highly correlated with EC and 9 other fieldsHigh correlation
WQI is highly correlated with pH and 10 other fieldsHigh correlation
EC is highly correlated with TDS and 10 other fieldsHigh correlation
TDS is highly correlated with EC and 10 other fieldsHigh correlation
TH is highly correlated with EC and 8 other fieldsHigh correlation
Alkalinity is highly correlated with EC and 7 other fieldsHigh correlation
Calcium is highly correlated with EC and 3 other fieldsHigh correlation
Magnesium is highly correlated with EC and 5 other fieldsHigh correlation
Sodium is highly correlated with EC and 5 other fieldsHigh correlation
Potassium is highly correlated with WQIHigh correlation
Bicarbonate is highly correlated with EC and 7 other fieldsHigh correlation
Chloride is highly correlated with EC and 6 other fieldsHigh correlation
Sulphate is highly correlated with EC and 4 other fieldsHigh correlation
Fluoride is highly correlated with WQIHigh correlation
is_drinkable is highly correlated with EC and 5 other fieldsHigh correlation
WQI is highly correlated with EC and 6 other fieldsHigh correlation
EC is highly correlated with TDS and 10 other fieldsHigh correlation
TDS is highly correlated with EC and 10 other fieldsHigh correlation
TH is highly correlated with EC and 7 other fieldsHigh correlation
Alkalinity is highly correlated with EC and 6 other fieldsHigh correlation
Calcium is highly correlated with EC and 2 other fieldsHigh correlation
Magnesium is highly correlated with EC and 5 other fieldsHigh correlation
Sodium is highly correlated with EC and 3 other fieldsHigh correlation
Bicarbonate is highly correlated with EC and 5 other fieldsHigh correlation
Chloride is highly correlated with EC and 4 other fieldsHigh correlation
Sulphate is highly correlated with EC and 2 other fieldsHigh correlation
Fluoride is highly correlated with WQIHigh correlation
is_drinkable is highly correlated with EC and 7 other fieldsHigh correlation
WQI is highly correlated with EC and 4 other fieldsHigh correlation
is_drinkable is highly correlated with WQCHigh correlation
WQC is highly correlated with is_drinkableHigh correlation
District is highly correlated with pH and 3 other fieldsHigh correlation
pH is highly correlated with DistrictHigh correlation
EC is highly correlated with TDS and 12 other fieldsHigh correlation
TDS is highly correlated with EC and 12 other fieldsHigh correlation
TH is highly correlated with EC and 8 other fieldsHigh correlation
Alkalinity is highly correlated with District and 11 other fieldsHigh correlation
Calcium is highly correlated with EC and 4 other fieldsHigh correlation
Magnesium is highly correlated with EC and 9 other fieldsHigh correlation
Sodium is highly correlated with EC and 7 other fieldsHigh correlation
Potassium is highly correlated with EC and 5 other fieldsHigh correlation
Bicarbonate is highly correlated with District and 11 other fieldsHigh correlation
Chloride is highly correlated with EC and 7 other fieldsHigh correlation
Sulphate is highly correlated with EC and 4 other fieldsHigh correlation
Fluoride is highly correlated with Alkalinity and 4 other fieldsHigh correlation
is_drinkable is highly correlated with EC and 9 other fieldsHigh correlation
WQI is highly correlated with EC and 10 other fieldsHigh correlation
WQC is highly correlated with District and 9 other fieldsHigh correlation
Village is uniformly distributed Uniform
WQI has unique values Unique
Sulphate has 82 (6.6%) zeros Zeros

Reproduction

Analysis started2022-07-24 13:23:20.408365
Analysis finished2022-07-24 13:24:24.568987
Duration1 minute and 4.16 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

District
Categorical

HIGH CORRELATION

Distinct30
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
Ganjam
 
81
Sambalpur
 
79
Mayurbhanj
 
75
Sundargarh
 
68
Bargarh
 
68
Other values (25)
870 

Length

Max length13
Median length10
Mean length7.707493956
Min length4

Characters and Unicode

Total characters9565
Distinct characters32
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAngul
2nd rowAngul
3rd rowAngul
4th rowAngul
5th rowAngul

Common Values

ValueCountFrequency (%)
Ganjam81
 
6.5%
Sambalpur79
 
6.4%
Mayurbhanj75
 
6.0%
Sundargarh68
 
5.5%
Bargarh68
 
5.5%
Kendujhar64
 
5.2%
Puri62
 
5.0%
Koraput62
 
5.0%
Cuttack60
 
4.8%
Khordha60
 
4.8%
Other values (20)562
45.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
ganjam81
 
6.5%
sambalpur79
 
6.4%
mayurbhanj75
 
6.0%
sundargarh68
 
5.5%
bargarh68
 
5.5%
kendujhar64
 
5.2%
puri62
 
5.0%
koraput62
 
5.0%
cuttack60
 
4.8%
khordha60
 
4.8%
Other values (20)562
45.3%

Most occurring characters

ValueCountFrequency (%)
a2030
21.2%
r1022
 
10.7%
u731
 
7.6%
h617
 
6.5%
n617
 
6.5%
d396
 
4.1%
g368
 
3.8%
p332
 
3.5%
l299
 
3.1%
j280
 
2.9%
Other values (22)2873
30.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8324
87.0%
Uppercase Letter1241
 
13.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2030
24.4%
r1022
12.3%
u731
 
8.8%
h617
 
7.4%
n617
 
7.4%
d396
 
4.8%
g368
 
4.4%
p332
 
4.0%
l299
 
3.6%
j280
 
3.4%
Other values (10)1632
19.6%
Uppercase Letter
ValueCountFrequency (%)
K259
20.9%
B204
16.4%
S199
16.0%
G113
9.1%
M96
 
7.7%
N93
 
7.5%
P62
 
5.0%
C60
 
4.8%
J49
 
3.9%
A45
 
3.6%
Other values (2)61
 
4.9%

Most occurring scripts

ValueCountFrequency (%)
Latin9565
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2030
21.2%
r1022
 
10.7%
u731
 
7.6%
h617
 
6.5%
n617
 
6.5%
d396
 
4.1%
g368
 
3.8%
p332
 
3.5%
l299
 
3.1%
j280
 
2.9%
Other values (22)2873
30.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9565
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2030
21.2%
r1022
 
10.7%
u731
 
7.6%
h617
 
6.5%
n617
 
6.5%
d396
 
4.1%
g368
 
3.8%
p332
 
3.5%
l299
 
3.1%
j280
 
2.9%
Other values (22)2873
30.0%

Village
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1210
Distinct (%)97.5%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
Nuagaon
 
4
Jagannathpur
 
3
Kharmanda
 
3
Sakhigopal
 
2
Gosala
 
2
Other values (1205)
1227 

Length

Max length28
Median length23
Mean length9.185334408
Min length3

Characters and Unicode

Total characters11399
Distinct characters64
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1183 ?
Unique (%)95.3%

Sample

1st rowChauliakata
2nd rowGodibandha
3rd rowSamal
4th rowSipur
5th rowKhamar-1

Common Values

ValueCountFrequency (%)
Nuagaon4
 
0.3%
Jagannathpur3
 
0.2%
Kharmanda3
 
0.2%
Sakhigopal2
 
0.2%
Gosala2
 
0.2%
Sikharpur2
 
0.2%
Indupur2
 
0.2%
Choudwar2
 
0.2%
Harbhanga2
 
0.2%
Usbelika2
 
0.2%
Other values (1200)1217
98.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
130
 
2.1%
nagar9
 
0.6%
chhak7
 
0.5%
26
 
0.4%
nuagaon5
 
0.3%
bazar5
 
0.3%
road4
 
0.3%
chawk3
 
0.2%
kharmanda3
 
0.2%
temple3
 
0.2%
Other values (1309)1358
94.8%

Most occurring characters

ValueCountFrequency (%)
a2452
21.5%
i813
 
7.1%
r745
 
6.5%
u678
 
5.9%
n642
 
5.6%
h549
 
4.8%
d493
 
4.3%
l449
 
3.9%
p410
 
3.6%
g325
 
2.9%
Other values (54)3843
33.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9418
82.6%
Uppercase Letter1400
 
12.3%
Decimal Number229
 
2.0%
Space Separator190
 
1.7%
Dash Punctuation113
 
1.0%
Other Punctuation15
 
0.1%
Close Punctuation15
 
0.1%
Open Punctuation15
 
0.1%
Control4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2452
26.0%
i813
 
8.6%
r745
 
7.9%
u678
 
7.2%
n642
 
6.8%
h549
 
5.8%
d493
 
5.2%
l449
 
4.8%
p410
 
4.4%
g325
 
3.5%
Other values (16)1862
19.8%
Uppercase Letter
ValueCountFrequency (%)
B242
17.3%
K170
12.1%
S133
9.5%
R96
 
6.9%
G86
 
6.1%
D86
 
6.1%
P84
 
6.0%
J77
 
5.5%
M73
 
5.2%
C68
 
4.9%
Other values (12)285
20.4%
Decimal Number
ValueCountFrequency (%)
184
36.7%
237
16.2%
327
 
11.8%
018
 
7.9%
417
 
7.4%
711
 
4.8%
511
 
4.8%
910
 
4.4%
87
 
3.1%
67
 
3.1%
Space Separator
ValueCountFrequency (%)
190
100.0%
Dash Punctuation
ValueCountFrequency (%)
-113
100.0%
Other Punctuation
ValueCountFrequency (%)
.15
100.0%
Close Punctuation
ValueCountFrequency (%)
)15
100.0%
Open Punctuation
ValueCountFrequency (%)
(15
100.0%
Control
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10818
94.9%
Common581
 
5.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2452
22.7%
i813
 
7.5%
r745
 
6.9%
u678
 
6.3%
n642
 
5.9%
h549
 
5.1%
d493
 
4.6%
l449
 
4.2%
p410
 
3.8%
g325
 
3.0%
Other values (38)3262
30.2%
Common
ValueCountFrequency (%)
190
32.7%
-113
19.4%
184
14.5%
237
 
6.4%
327
 
4.6%
018
 
3.1%
417
 
2.9%
.15
 
2.6%
)15
 
2.6%
(15
 
2.6%
Other values (6)50
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII11399
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2452
21.5%
i813
 
7.1%
r745
 
6.5%
u678
 
5.9%
n642
 
5.6%
h549
 
4.8%
d493
 
4.3%
l449
 
3.9%
p410
 
3.6%
g325
 
2.9%
Other values (54)3843
33.7%

pH
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct185
Distinct (%)14.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.828622079
Minimum6.46
Maximum8.78
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum6.46
5-th percentile7.07
Q17.58
median7.9
Q38.12
95-th percentile8.35
Maximum8.78
Range2.32
Interquartile range (IQR)0.54

Descriptive statistics

Standard deviation0.3996080174
Coefficient of variation (CV)0.05104448948
Kurtosis-0.1158318734
Mean7.828622079
Median Absolute Deviation (MAD)0.26
Skewness-0.6268527131
Sum9715.32
Variance0.1596865675
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.345
 
3.6%
7.920
 
1.6%
8.220
 
1.6%
8.0219
 
1.5%
8.119
 
1.5%
8.0819
 
1.5%
818
 
1.5%
7.9418
 
1.5%
7.9817
 
1.4%
7.8816
 
1.3%
Other values (175)1030
83.0%
ValueCountFrequency (%)
6.461
0.1%
6.51
0.1%
6.541
0.1%
6.61
0.1%
6.641
0.1%
6.711
0.1%
6.731
0.1%
6.751
0.1%
6.772
0.2%
6.811
0.1%
ValueCountFrequency (%)
8.781
0.1%
8.631
0.1%
8.621
0.1%
8.611
0.1%
8.61
0.1%
8.592
0.2%
8.581
0.1%
8.562
0.2%
8.551
0.1%
8.541
0.1%

EC
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct249
Distinct (%)20.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean695.4392828
Minimum7.15
Maximum5770
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum7.15
5-th percentile150
Q1360
median550
Q3900
95-th percentile1610
Maximum5770
Range5762.85
Interquartile range (IQR)540

Descriptive statistics

Standard deviation536.8190624
Coefficient of variation (CV)0.7719136317
Kurtosis13.86554926
Mean695.4392828
Median Absolute Deviation (MAD)250
Skewness2.740791102
Sum863040.15
Variance288174.7058
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44025
 
2.0%
50023
 
1.9%
40020
 
1.6%
65020
 
1.6%
29019
 
1.5%
43019
 
1.5%
46018
 
1.5%
34018
 
1.5%
45018
 
1.5%
47018
 
1.5%
Other values (239)1043
84.0%
ValueCountFrequency (%)
7.151
 
0.1%
551
 
0.1%
604
 
0.3%
701
 
0.1%
752
 
0.2%
803
 
0.2%
871
 
0.1%
904
 
0.3%
10010
0.8%
1051
 
0.1%
ValueCountFrequency (%)
57701
0.1%
44501
0.1%
44201
0.1%
37401
0.1%
37201
0.1%
34701
0.1%
33101
0.1%
31401
0.1%
30501
0.1%
30301
0.1%

TDS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct601
Distinct (%)48.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean358.0572119
Minimum30
Maximum2766
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum30
5-th percentile82
Q1186
median277
Q3456
95-th percentile843
Maximum2766
Range2736
Interquartile range (IQR)270

Descriptive statistics

Standard deviation280.9793428
Coefficient of variation (CV)0.7847330914
Kurtosis13.07617928
Mean358.0572119
Median Absolute Deviation (MAD)123
Skewness2.742833455
Sum444349
Variance78949.39108
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2367
 
0.6%
997
 
0.6%
2077
 
0.6%
2237
 
0.6%
2007
 
0.6%
2047
 
0.6%
2626
 
0.5%
1836
 
0.5%
2676
 
0.5%
2276
 
0.5%
Other values (591)1175
94.7%
ValueCountFrequency (%)
301
0.1%
351
0.1%
361
0.1%
401
0.1%
411
0.1%
421
0.1%
432
0.2%
452
0.2%
462
0.2%
472
0.2%
ValueCountFrequency (%)
27661
0.1%
25601
0.1%
23351
0.1%
19141
0.1%
18721
0.1%
18581
0.1%
17691
0.1%
16821
0.1%
16061
0.1%
15861
0.1%

TH
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct379
Distinct (%)30.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean215.0153102
Minimum20
Maximum1945
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum20
5-th percentile53
Q1123
median184
Q3267
95-th percentile466
Maximum1945
Range1925
Interquartile range (IQR)144

Descriptive statistics

Standard deviation156.7872729
Coefficient of variation (CV)0.729191204
Kurtosis30.89449597
Mean215.0153102
Median Absolute Deviation (MAD)68
Skewness3.965647141
Sum266834
Variance24582.24896
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15815
 
1.2%
19814
 
1.1%
16313
 
1.0%
15712
 
1.0%
5412
 
1.0%
5012
 
1.0%
17712
 
1.0%
5912
 
1.0%
22311
 
0.9%
12410
 
0.8%
Other values (369)1118
90.1%
ValueCountFrequency (%)
202
 
0.2%
241
 
0.1%
253
0.2%
261
 
0.1%
292
 
0.2%
303
0.2%
331
 
0.1%
355
0.4%
371
 
0.1%
381
 
0.1%
ValueCountFrequency (%)
19451
0.1%
17231
0.1%
16651
0.1%
15461
0.1%
12621
0.1%
11061
0.1%
9141
0.1%
9001
0.1%
7701
0.1%
7361
0.1%

Alkalinity
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct337
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean178.4963739
Minimum15
Maximum765
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum15
5-th percentile44
Q1105
median158
Q3228
95-th percentile377
Maximum765
Range750
Interquartile range (IQR)123

Descriptive statistics

Standard deviation104.9320142
Coefficient of variation (CV)0.5878663635
Kurtosis3.034851964
Mean178.4963739
Median Absolute Deviation (MAD)59
Skewness1.339800088
Sum221514
Variance11010.72761
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18515
 
1.2%
12914
 
1.1%
5013
 
1.0%
19513
 
1.0%
11913
 
1.0%
15712
 
1.0%
15312
 
1.0%
13412
 
1.0%
10512
 
1.0%
5511
 
0.9%
Other values (327)1114
89.8%
ValueCountFrequency (%)
151
 
0.1%
203
0.2%
211
 
0.1%
241
 
0.1%
257
0.6%
261
 
0.1%
292
 
0.2%
305
0.4%
333
0.2%
342
 
0.2%
ValueCountFrequency (%)
7651
0.1%
7501
0.1%
6951
0.1%
6531
0.1%
6441
0.1%
6101
0.1%
6041
0.1%
5991
0.1%
5901
0.1%
5561
0.1%

Calcium
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct125
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.94198227
Minimum0
Maximum497
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum0
5-th percentile12
Q126
median39
Q353
95-th percentile94
Maximum497
Range497
Interquartile range (IQR)27

Descriptive statistics

Standard deviation30.61219216
Coefficient of variation (CV)0.6966502323
Kurtosis45.84042566
Mean43.94198227
Median Absolute Deviation (MAD)13
Skewness4.465762074
Sum54532
Variance937.1063086
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4049
 
3.9%
2044
 
3.5%
4438
 
3.1%
2238
 
3.1%
3637
 
3.0%
4237
 
3.0%
3835
 
2.8%
3035
 
2.8%
4833
 
2.7%
2432
 
2.6%
Other values (115)863
69.5%
ValueCountFrequency (%)
01
 
0.1%
21
 
0.1%
42
 
0.2%
69
 
0.7%
814
1.1%
1030
2.4%
1224
1.9%
133
 
0.2%
1423
1.9%
152
 
0.2%
ValueCountFrequency (%)
4971
0.1%
2761
0.1%
2531
0.1%
2341
0.1%
2061
0.1%
1921
0.1%
1811
0.1%
1781
0.1%
1731
0.1%
1701
0.1%

Magnesium
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct106
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.61724416
Minimum-4
Maximum345
Zeros5
Zeros (%)0.4%
Negative1
Negative (%)0.1%
Memory size9.8 KiB

Quantile statistics

Minimum-4
5-th percentile4
Q110
median19
Q334
95-th percentile67
Maximum345
Range349
Interquartile range (IQR)24

Descriptive statistics

Standard deviation25.83513473
Coefficient of variation (CV)1.008505621
Kurtosis37.75728504
Mean25.61724416
Median Absolute Deviation (MAD)10
Skewness4.455767424
Sum31791
Variance667.4541863
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
655
 
4.4%
1349
 
3.9%
1244
 
3.5%
1042
 
3.4%
1741
 
3.3%
438
 
3.1%
937
 
3.0%
737
 
3.0%
1635
 
2.8%
534
 
2.7%
Other values (96)829
66.8%
ValueCountFrequency (%)
-41
 
0.1%
05
 
0.4%
120
 
1.6%
219
 
1.5%
38
 
0.6%
438
3.1%
534
2.7%
655
4.4%
737
3.0%
830
2.4%
ValueCountFrequency (%)
3451
0.1%
3001
0.1%
2541
0.1%
2101
0.1%
1711
0.1%
1601
0.1%
1541
0.1%
1511
0.1%
1341
0.1%
1231
0.1%

Sodium
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct192
Distinct (%)15.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.99194198
Minimum0
Maximum820
Zeros7
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum0
5-th percentile3
Q117
median30
Q365
95-th percentile155
Maximum820
Range820
Interquartile range (IQR)48

Descriptive statistics

Standard deviation61.03354392
Coefficient of variation (CV)1.220867634
Kurtosis30.73986706
Mean49.99194198
Median Absolute Deviation (MAD)18
Skewness4.14940391
Sum62040
Variance3725.093483
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2065
 
5.2%
130
 
2.4%
1628
 
2.3%
1828
 
2.3%
1527
 
2.2%
2126
 
2.1%
925
 
2.0%
3124
 
1.9%
1724
 
1.9%
1923
 
1.9%
Other values (182)941
75.8%
ValueCountFrequency (%)
07
 
0.6%
130
2.4%
218
1.5%
317
1.4%
415
1.2%
517
1.4%
68
 
0.6%
78
 
0.6%
813
1.0%
925
2.0%
ValueCountFrequency (%)
8201
0.1%
5051
0.1%
4741
0.1%
4701
0.1%
4231
0.1%
4031
0.1%
4001
0.1%
3411
0.1%
3211
0.1%
3122
0.2%

Potassium
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct321
Distinct (%)25.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.16253022
Minimum0
Maximum332
Zeros5
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum0
5-th percentile0.5
Q11.6
median3.8
Q310.1
95-th percentile58
Maximum332
Range332
Interquartile range (IQR)8.5

Descriptive statistics

Standard deviation29.67099414
Coefficient of variation (CV)2.254201407
Kurtosis32.15284121
Mean13.16253022
Median Absolute Deviation (MAD)2.7
Skewness5.043696058
Sum16334.7
Variance880.3678933
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
139
 
3.1%
228
 
2.3%
1.828
 
2.3%
327
 
2.2%
1.127
 
2.2%
0.826
 
2.1%
0.124
 
1.9%
0.724
 
1.9%
1.223
 
1.9%
1.421
 
1.7%
Other values (311)974
78.5%
ValueCountFrequency (%)
05
 
0.4%
0.124
1.9%
0.27
 
0.6%
0.38
 
0.6%
0.412
1.0%
0.514
1.1%
0.612
1.0%
0.724
1.9%
0.826
2.1%
0.921
1.7%
ValueCountFrequency (%)
3321
0.1%
2562
0.2%
232.11
0.1%
2291
0.1%
207.21
0.1%
206.81
0.1%
1961
0.1%
180.61
0.1%
180.31
0.1%
178.61
0.1%

Bicarbonate
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct380
Distinct (%)30.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean215.7993554
Minimum18
Maximum933
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum18
5-th percentile55
Q1128
median192
Q3275
95-th percentile455
Maximum933
Range915
Interquartile range (IQR)147

Descriptive statistics

Standard deviation126.5073791
Coefficient of variation (CV)0.5862268629
Kurtosis3.12407848
Mean215.7993554
Median Absolute Deviation (MAD)71
Skewness1.352002421
Sum267807
Variance16004.11697
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18713
 
1.0%
22313
 
1.0%
23812
 
1.0%
12812
 
1.0%
13412
 
1.0%
15412
 
1.0%
22611
 
0.9%
6111
 
0.9%
6711
 
0.9%
14611
 
0.9%
Other values (370)1123
90.5%
ValueCountFrequency (%)
181
 
0.1%
243
0.2%
251
 
0.1%
291
 
0.1%
304
0.3%
313
0.2%
321
 
0.1%
352
0.2%
363
0.2%
371
 
0.1%
ValueCountFrequency (%)
9331
0.1%
9151
0.1%
8481
0.1%
7971
0.1%
7701
0.1%
7441
0.1%
7371
0.1%
7201
0.1%
6701
0.1%
6651
0.1%

Chloride
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct279
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.17002417
Minimum0
Maximum1753
Zeros2
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum0
5-th percentile10
Q126
median55
Q3110
95-th percentile291
Maximum1753
Range1753
Interquartile range (IQR)84

Descriptive statistics

Standard deviation127.1443335
Coefficient of variation (CV)1.379454271
Kurtosis48.13562126
Mean92.17002417
Median Absolute Deviation (MAD)33
Skewness5.49142232
Sum114383
Variance16165.68155
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2632
 
2.6%
1728
 
2.3%
2427
 
2.2%
726
 
2.1%
1025
 
2.0%
4324
 
1.9%
1923
 
1.9%
5523
 
1.9%
3621
 
1.7%
2920
 
1.6%
Other values (269)992
79.9%
ValueCountFrequency (%)
02
 
0.2%
25
 
0.4%
32
 
0.2%
512
1.0%
726
2.1%
84
 
0.3%
1025
2.0%
111
 
0.1%
1220
1.6%
135
 
0.4%
ValueCountFrequency (%)
17531
0.1%
14101
0.1%
13141
0.1%
10651
0.1%
9861
0.1%
9851
0.1%
7581
0.1%
7541
0.1%
7361
0.1%
7071
0.1%

Sulphate
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct115
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.22320709
Minimum-3
Maximum434
Zeros82
Zeros (%)6.6%
Negative1
Negative (%)0.1%
Memory size9.8 KiB

Quantile statistics

Minimum-3
5-th percentile0
Q14
median17
Q338
95-th percentile82
Maximum434
Range437
Interquartile range (IQR)34

Descriptive statistics

Standard deviation30.66201248
Coefficient of variation (CV)1.169270119
Kurtosis29.43368742
Mean26.22320709
Median Absolute Deviation (MAD)14
Skewness3.466286451
Sum32543
Variance940.1590094
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
082
 
6.6%
178
 
6.3%
358
 
4.7%
451
 
4.1%
244
 
3.5%
543
 
3.5%
1129
 
2.3%
929
 
2.3%
1026
 
2.1%
1825
 
2.0%
Other values (105)776
62.5%
ValueCountFrequency (%)
-31
 
0.1%
082
6.6%
178
6.3%
244
3.5%
358
4.7%
451
4.1%
543
3.5%
623
 
1.9%
725
 
2.0%
821
 
1.7%
ValueCountFrequency (%)
4341
 
0.1%
2501
 
0.1%
2101
 
0.1%
1861
 
0.1%
1601
 
0.1%
1521
 
0.1%
1451
 
0.1%
1423
0.2%
1381
 
0.1%
1351
 
0.1%

Fluoride
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct154
Distinct (%)12.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3985173247
Minimum0.02
Maximum3.94
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum0.02
5-th percentile0.07
Q10.16
median0.27
Q30.47
95-th percentile1.2
Maximum3.94
Range3.92
Interquartile range (IQR)0.31

Descriptive statistics

Standard deviation0.419843934
Coefficient of variation (CV)1.053514886
Kurtosis16.8172457
Mean0.3985173247
Median Absolute Deviation (MAD)0.13
Skewness3.372375932
Sum494.56
Variance0.1762689289
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.1142
 
3.4%
0.1835
 
2.8%
0.1335
 
2.8%
0.2935
 
2.8%
0.1634
 
2.7%
0.2134
 
2.7%
0.1432
 
2.6%
0.1232
 
2.6%
0.1731
 
2.5%
0.1931
 
2.5%
Other values (144)900
72.5%
ValueCountFrequency (%)
0.021
 
0.1%
0.033
 
0.2%
0.0411
 
0.9%
0.059
 
0.7%
0.0621
1.7%
0.0720
1.6%
0.0816
 
1.3%
0.0923
1.9%
0.128
2.3%
0.1142
3.4%
ValueCountFrequency (%)
3.941
0.1%
3.61
0.1%
3.541
0.1%
3.321
0.1%
3.11
0.1%
3.081
0.1%
2.911
0.1%
2.51
0.1%
2.491
0.1%
2.411
0.1%

is_drinkable
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
1
656 
0
585 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1241
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1656
52.9%
0585
47.1%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
1656
52.9%
0585
47.1%

Most occurring characters

ValueCountFrequency (%)
1656
52.9%
0585
47.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1241
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1656
52.9%
0585
47.1%

Most occurring scripts

ValueCountFrequency (%)
Common1241
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1656
52.9%
0585
47.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1241
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1656
52.9%
0585
47.1%

WQI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct1241
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.52514704
Minimum2.92080857
Maximum271.4228748
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 KiB

Quantile statistics

Minimum2.92080857
5-th percentile10.69497045
Q120.47250591
median30.77209109
Q347.56704838
95-th percentile96.1546608
Maximum271.4228748
Range268.5020663
Interquartile range (IQR)27.09454248

Descriptive statistics

Standard deviation30.85646353
Coefficient of variation (CV)0.780679285
Kurtosis10.00919359
Mean39.52514704
Median Absolute Deviation (MAD)11.86706255
Skewness2.637476657
Sum49050.70747
Variance952.1213414
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19.032351141
 
0.1%
23.947725871
 
0.1%
14.746414441
 
0.1%
14.938273741
 
0.1%
17.397888781
 
0.1%
25.751172641
 
0.1%
29.45232771
 
0.1%
16.684379731
 
0.1%
26.144160281
 
0.1%
29.95555891
 
0.1%
Other values (1231)1231
99.2%
ValueCountFrequency (%)
2.920808571
0.1%
4.7414974681
0.1%
4.7881916871
0.1%
5.1258832571
0.1%
5.4641818081
0.1%
5.5904067311
0.1%
5.9072613411
0.1%
5.9113212311
0.1%
5.9551437781
0.1%
5.9637747711
0.1%
ValueCountFrequency (%)
271.42287481
0.1%
225.78880951
0.1%
222.65775851
0.1%
215.13328681
0.1%
205.06708271
0.1%
204.87882071
0.1%
203.51432771
0.1%
191.14161211
0.1%
183.81700261
0.1%
183.07002161
0.1%

WQC
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
Good
513 
Excellent
436 
Poor
173 
Very Poor
119 

Length

Max length9
Median length4
Mean length6.236099919
Min length4

Characters and Unicode

Total characters7739
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowExcellent
2nd rowExcellent
3rd rowVery Poor
4th rowGood
5th rowExcellent

Common Values

ValueCountFrequency (%)
Good513
41.3%
Excellent436
35.1%
Poor173
 
13.9%
Very Poor119
 
9.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
good513
37.7%
excellent436
32.1%
poor292
21.5%
very119
 
8.8%

Most occurring characters

ValueCountFrequency (%)
o1610
20.8%
e991
12.8%
l872
11.3%
G513
 
6.6%
d513
 
6.6%
E436
 
5.6%
x436
 
5.6%
c436
 
5.6%
n436
 
5.6%
t436
 
5.6%
Other values (5)1060
13.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6260
80.9%
Uppercase Letter1360
 
17.6%
Space Separator119
 
1.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o1610
25.7%
e991
15.8%
l872
13.9%
d513
 
8.2%
x436
 
7.0%
c436
 
7.0%
n436
 
7.0%
t436
 
7.0%
r411
 
6.6%
y119
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
G513
37.7%
E436
32.1%
P292
21.5%
V119
 
8.8%
Space Separator
ValueCountFrequency (%)
119
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7620
98.5%
Common119
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
o1610
21.1%
e991
13.0%
l872
11.4%
G513
 
6.7%
d513
 
6.7%
E436
 
5.7%
x436
 
5.7%
c436
 
5.7%
n436
 
5.7%
t436
 
5.7%
Other values (4)941
12.3%
Common
ValueCountFrequency (%)
119
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o1610
20.8%
e991
12.8%
l872
11.3%
G513
 
6.6%
d513
 
6.6%
E436
 
5.6%
x436
 
5.6%
c436
 
5.6%
n436
 
5.6%
t436
 
5.6%
Other values (5)1060
13.7%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

DistrictVillagepHECTDSTHAlkalinityCalciumMagnesiumSodiumPotassiumBicarbonateChlorideSulphateFluorideis_drinkableWQIWQC
0AngulChauliakata7.22210.0105856022754.27321100.27119.032351Excellent
1AngulGodibandha7.54310.0157100852012214.81044350.12115.589984Excellent
2AngulSamal8.08580.02821252002018723.524421281.52086.264578Very Poor
3AngulSipur8.25390.01911501453416165.71773030.31131.931635Good
4AngulKhamar-17.64460.02341651254215265.01537100.15118.932642Excellent
5AngulSrirampur7.88390.01961601053617151.21286040.27123.675539Excellent
6AngulPallahara7.99480.02441451103415411.813472150.11117.458106Excellent
7AngulJamardihi7.6290.043402510410.7311020.13112.470002Excellent
8AngulSendhogram7.81820.04282002452434952.229982441.59088.117828Very Poor
9AngulBhogabereni7.422440.01292515425787831220.65193631860.94074.916497Poor

Last rows

DistrictVillagepHECTDSTHAlkalinityCalciumMagnesiumSodiumPotassiumBicarbonateChlorideSulphateFluorideis_drinkableWQIWQC
1231SundargarhR-27 Sector-78.26270.013811398291091.91202450.27126.533711Good
1232SundargarhR-29 Sector-98.28260.0128108110241291.21341240.27126.241266Good
1233SundargarhR-30 Sector-138.11190.0100785526362.9672270.18120.994659Excellent
1234SundargarhR-31 Sector-148.20170.089697122353.7871020.30127.922359Good
1235SundargarhR-32 Sector-208.27330.016811393319214.811431160.18124.550178Excellent
1236SundargarhR-33 Sector 188.22340.01769382314314.410034230.14121.589240Excellent
1237SundargarhR-34 Sector-177.90240.01189349181284.5604150.14119.066396Excellent
1238SundargarhR-36 Sector-158.27340.0156147131291892.21601450.26127.092631Good
1239SundargarhR-37 Vedvyas8.26740.03782701267719441.515415650.21025.709087Good
1240SundargarhR-38 Kalunga8.24340.01731527751662.4945390.23125.023410Good